Original Author: Dr. Sheehan Olver Compiled by: Isaac Lee | Date: 16/03/2022 |
Github Source
MATH50003 Combined Problem Sheet SolutionsMATH50003 Problem Sheet 11. Binary representation2. Integers3. Floating point numbers4. Arithmetic5. Interval arithmeticMATH50003 Numerical Analysis: Problem Sheet 21. Finite-differences2. Dual numbers3. Newton iterationMATH50003 Numerical Analysis: Problem 31. Dense Matrices2. Triangular Matrices3. Banded matrices4. Permutations5. Orthogonal matricesIf your code is correct, these "unit tests" will succeedMATH50003 Numerical Analysis: Problem Sheet 41. Least squares and QR decompositions2. Gram–Schmidt3. Householder reflections4. Banded QR with Given's rotations5. PLU decompositionMATH50003 Numerical Analysis: Problem Sheet 51. Positive definite matrices and Cholesky decompositions2. Matrix norms3. Singular value decompositionMATH50003 Numerical Analysis: Problem Sheet 61. Condition numbers2. Indefinite integration3. Euler methodsMATH50003 Numerical Analysis: Problem Sheet 71. Two-point boundary value problems2. ConvergenceConsistency:Stability:Consistency:Stability:3. Fourier series
This problem sheet tests the representation of numbers on the computer, using modular arithmetic. We also use floating point rounding modes to implement interval arithmetic, and thereby produce rigorous bounds on the exponential.
xxxxxxxxxxusing ColorBitstring, SetRoundingQuestions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
Problem 1.1 (C) What is the binary representation of printbits to derive a proposed form.)
SOLUTION Note that
printbits(1/5)Hence we show that
Problem 1.2 (C) What is
SOLUTION Note that
3 + 1/8 + 1/64which has the binary representation
printbits(Float16(π))Instead of simply guessing the above representation we can instead continuously subtract the largest powers 2 which do not result in a negative number. For
Problem 2.1 (C) With 8-bit signed integers, find the bits for the following:
SOLUTION
We can find the binary digits by repeatedly subtracting the largest power of 2 less than a number
until we reach 0, e.g.
xxxxxxxxxxprintlnbits(Int8(10))Similarly,
Thus the bits are:
xxxxxxxxxxprintlnbits(Int8(120))For negative numbers we perform the same trick but adding
This the bits are:
xxxxxxxxxxprintlnbits(Int8(-10))
Problem 2.2 (C) What will Int8(120) + Int8(10) return?
SOLUTION It will return
xxxxxxxxxxInt8(120) + Int8(10)Problem 3.1 (C) What are the single precision Float32) floating point representations for the following:
Check your answers using printbits.
SOLUTION
Recall that we have σ,Q,S = 127,8,23. Thus we write
The exponent bits are those of
Hence we get
xxxxxxxxxxprintlnbits(2f0)We write
And note that
xxxxxxxxxxprintlnbits(31f0)On the other hand,
and
xxxxxxxxxxprintlnbits(32f0)Note that
and
xxxxxxxxxxprintlnbits(23f0/4)Finally,
and
xxxxxxxxxxprintlnbits(23f0/4 * 2f0^100)Problem 3.2 (B) Let nextfloat
command.
SOLUTION
The next float after
xxxxxxxxxxnextfloat(2f0) - 2, 2^(-22)similarly, for
xxxxxxxxxx nextfloat(1024f0) - 1024, 2^(-13)
Problem 4.1 (C) Suppose Float16).
What is the error in approximating
SOLUTION None of these computations have errors since they are all exactly representable as floating point numbers.
Problem 4.2 (B) For what floating point numbers is
SOLUTION
Consider a normal
However, if
and the property will be satisfy if
and the property will be satisfy if NaN.)
Here are two examples:
xxxxxxxxxx# normal number with q = 1 and last bit 1x = reinterpret(Float16, parse(UInt16, "0000010000000011"; base=2))x/2 == Float64(x)/2 # Float64 can exactly represent x/2xxxxxxxxxx# sub-normal number with q = 1 and last bit 1x = reinterpret(Float16, parse(UInt16, "0000000000000011"; base=2))x/2 == Float64(x)/2 # Float64 can exactly represent x/2For the second part, Similar to the next problem,
we see that the property holds true if
xxxxxxxxxxx = Float16(2)^(12)-1 # bits 0110110000000000x+2 == xWe see this is sharp:
xxxxxxxxxxy = prevfloat(x)y+2 == y
Problem 4.3 (A) Explain why the following return true. What is the largest floating point number y such that y + 1 ≠ y?
xxxxxxxxxxx = 10.0^100x + 1 == xSOLUTION
Writing
where the bits
since
The largest floating point number satisfying the condition is
xxxxxxxxxxx = 2.0^53x + 1 == xWe can however successfully create the previsous float x+1 fails):
xxxxxxxxxxy = x - 1printlnbits(x)printlnbits(y)And this satisfies:
xxxxxxxxxxy + 1 ≠ yProblem 4.4 (B) What are the exact bits for Float16) (using default rounding)?
SOLUTION
We saw above that
where the
xxxxxxxxxxprintbits(Float16(1)/5)Adding 1 we get:
Here we write the exponent as
xxxxxxxxxxprintbits(1 + Float16(1)/5)Problem 4.5 (A) Explain why the following does not return 1. Can you compute the bits explicitly?
xxxxxxxxxxFloat16(0.1) / (Float16(1.1) - 1)
SOLUTION For the last problem, note that
hence we have
and
Thus
and hence we get
To compute the bits explicitly, write
We then have
Hence
Therefore we round up (the
xxxxxxxxxxprintlnbits(Float16(0.1) / (Float16(1.1) - 1))Problem 4.6 (B) Find a bound on the absolute error in terms of a constant times
implemented using floating point arithmetic (with any precision).
SOLUTION
The first problem is very similar to what we saw in lecture. Write
We first write
where
Then we have
where
Finally,
where
For the second part, we do:
Write
where
using the fact that
where
We also write
where
Then we get
and the error is bounded by:
This is quite pessimistic but still captures that we are on the order of
The following problems consider implementation of interval arithmetic for proving precise bounds on arithmetic operations. That is recall the set operations
Problem 5.1 (B) For intervals
Solution
Problem 5.2 (B)
We want to implement floating point variants such that, for
This guarantees
Use the formulae from Problem 5.1 to complete (by replacing the # TODO: … comments with code)
the following implementation of an
Interval
so that +, -, and / implement
x# Interval(a,b) represents the closed interval [a,b]struct Interval{T} a::T b::Tend
import Base: *, +, -, /, one, in
# create an interval corresponding to [1,1]one(x::Interval) = Interval(one(x.a), one(x.b))
# Support x in Interval(a,b)in(x, y::Interval) = y.a ≤ x ≤ y.b
# Following should implement ⊕function +(x::Interval, y::Interval) T = promote_type(typeof(x.a), typeof(x.b) ) a = setrounding(T, RoundDown) do # TODO: upper bound ## SOLUTION x.a + y.a ## END end b = setrounding(T, RoundUp) do # TODO: upper bound ## SOLUTION x.b + y.b ## END end Interval(a, b)end
# Following should implement ⊘function /(x::Interval, n::Integer) T = typeof(x.a) if iszero(n) error("Dividing by zero not support") end a = setrounding(T, RoundDown) do # TODO: lower bound ## SOLUTION if n > 0 x.a / n else x.b / n end ## END end b = setrounding(T, RoundUp) do # TODO: upper bound ## SOLUTION if n > 0 x.b / n else x.a / n end ## END end Interval(a, b)end
# Following should implement ⊗function *(x::Interval, y::Interval) T = promote_type(typeof(x.a), typeof(x.b)) if 0 in x || 0 in y error("Multiplying with intervals containing 0 not supported.") end a = setrounding(T, RoundDown) do # TODO: lower bound ## SOLUTION if x.a < 0 && x.b < 0 && y.a < 0 && y.b < 0 y.b * x.b elseif x.a < 0 && x.b < 0 && y.a > 0 && y.b > 0 x.a * y.b elseif x.a > 0 && x.b > 0 && y.a < 0 && y.b < 0 x.b * y.a else x.a * y.a end ## END end b = setrounding(T, RoundUp) do # TODO: upper bound ## SOLUTION if x.a < 0 && x.b < 0 && y.a < 0 && y.b < 0 y.a * x.a elseif x.a < 0 && x.b < 0 && y.a > 0 && y.b > 0 x.b * y.a elseif x.a > 0 && x.b > 0 && y.a < 0 && y.b < 0 x.a * y.b else x.b * y.b end ## END end Interval(a, b)endProblem 5.3 (A) The following function computes the first n+1 terms of the Taylor series of
xxxxxxxxxxfunction exp_t(x, n) ret = one(x) # 1 of same type as x s = one(x) for k = 1:n s = s/k * x ret = ret + s end retendBound the tail of the Taylor series for exp_bound which computes
xxxxxxxxxxfunction exp_bound(x::Interval, n) if abs(x.a) > 1 || abs(x.b) > 1 error("Interval must be a subset of [-1, 1]") end ret = exp_t(x, n) # the code for Taylor series should work on Interval unmodified f = factorial(min(20, n + 1)) # avoid overflow in computing factorial T = typeof(ret.a)
# TODO: modify ret so that exp(x) is guaranteed to lie in it ## SOLUTION err = setrounding(T, RoundUp) do 3 / f end ret + Interval(-err,err) ## ENDendCheck your result for computing
xxxxxxxxxxexp(big(1)) in exp_bound(Interval(1.0,1.0), 20) && exp(big(-1)) in exp_bound(Interval(-1.0,-1.0), 20)Further, ensure that the width of each returned interval is less than
SOLUTION From the Taylor remainder theorem we know the error is
Thus by widening the computation by this error we ensure that we have captured the error by truncating the Taylor series.
This week we look at other variants of finite-differences, including central differences and second-order finite-differences. We also investigate mathematical properties of dual numbers and extend their implementation to other functions. Finally, we see how dual numbers can be combined with Newton iteration for root finding.
Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
Problem 1.1⋆ (B) Use Taylor's theorem to derive an error bound for central differences
Find an error bound when implemented in floating point arithmetic, assuming that
SOLUTION
By Taylor's theorem, the approximation around
Subtracting the second expression from the first we obtain
Thus, the error can be bounded by
In floating point we have
Applying Taylor's theorem we get
where
Problem 1.2 (B) Implement central differences for h = 2.0 .^ (0:-1:-60) and h = 10.0 .^ (0:-1:-16).
Derive the error exactly for the different cases to explain the observed behaviour.
SOLUTION
xxxxxxxxxxusing Plots
# define the functionsf = x -> 1 + x + x^2g = x -> 1 + x/3 + x^2
# define analytic first derivatives for comparisonfp = x -> 1 + 2 *xgp = x ->1/3 + 2 *x
# central difference derivative approximationcentraldiff(x, h, f) = (f(x + h) - f(x - h))/(2 *h) # computes an errorcentraldifferror(x, h, f, fp) = abs(centraldiff(x, h, f) - fp(x))
#plotting f and g errors x = 0.0 # some arbitrary point
# helper function to avoid trying to take logs of 0 in plotsnanabs(x) = iszero(x) ? NaN : abs(x)
# We find the error for the derivative of f is 0 # (until we run into the errors for too small h we discussed in the lecture)h = 2.0 .^ (0:-1:-60)plot(nanabs.(centraldifferror.(x, h, f, fp)), yaxis=:log10, label="f")plot!(nanabs.(centraldifferror.(x, h, g, gp)), yaxis=:log10, label="g")xxxxxxxxxxh = 10.0 .^ (0:-1:-16)plot(nanabs.(centraldifferror.(x, h, f, fp)), yaxis=:log10, label="f")plot!(nanabs.(centraldifferror.(x, h, g, gp)), yaxis=:log10, label="g")To compute the errors of the central difference approximation of
As we can see, in this case the central difference approximation is exact. The errors we start observing for small step sizes are thus numerical in nature. The values of the function at
To compute the errors of the central difference approximation of
Problem 1.3 (A)⋆ Use Taylor's theorem to derive an error bound on the second-order derivative approximation
Find an error bound when implemented in floating point arithmetic, assuming that
where
SOLUTION
Using the same two formulas as in 1.1 we have
Summing the two we obtain
Thus,
Hence, the error is
In floating point arithmetic, the error is
Problem 1.4 (B) Use finite-differences, central differences, and second-order finite-differences to approximate to 5-digits the first and second
derivatives to the following functions
at the point
where
e.g.:
SOLUTION
xxxxxxxxxx# Forward Differenceforwarddiff(x, h, f) = (f(x + h) - f(x))/h
# We already implemented central differences in a previous problem
# Second derivative via finite differencefinitediffsecond(x, h, f) = (f(x + h) - 2 * f(x) + f(x - h))/ (h^2) # Define the functionsf = x -> exp(exp(x)cos(x) + sin(x))g = x -> prod([x] ./ (1:1000) .- 1)function cont(n, x) ret = 2*one(x) for k = 1:n-1 ret = 2 + (x-1)/ret end 1 + (x-1)/retend
# Choose our pointx = 0.1;xxxxxxxxxx# We have to be a bit careful with the choice of hh = eps()# Values for exp(exp(x)cos(x) + sin(x))println("f'($x) with forward difference: ", forwarddiff(x, sqrt(h), f))println("f'($x) with central difference: ", centraldiff(x, cbrt(h), f))println("f''($x) via finite difference: ", finitediffsecond(x, cbrt(h), f))xxxxxxxxxx# Values for prod([x] ./ (1:1000) .- 1)println("f'($x) with forward difference: ", forwarddiff(x, sqrt(h), g))println("f'($x) with central difference: ", centraldiff(x, cbrt(h), g))println("f''($x) via finite difference: ", finitediffsecond(x, cbrt(h), g))xxxxxxxxxx# Values for the continued fractionprintln("f'($x) with forward difference: ", forwarddiff(x, sqrt(h), x->cont(1000,x)))println("f'($x) with central difference: ", centraldiff(x, sqrt(h), x->cont(1000,x)))println("f''($x) via finite difference: ", finitediffsecond(x, cbrt(h), x->cont(1000,x)))Problem 2.1⋆ (C)
Show that dual numbers
SOLUTION
In what follows we write
Additive associativity and commutativity and existence of additive inverse are both
immediate results of dual number addition reducing to element-wise real number addition.
Furthermore, by definition of addition on
We explicitly prove multiplicative commutativity
We also explicitly prove multiplicative associativity:
and
The number
Finally we show distributivity of multiplication:
Problem 2.2⋆ (C) A field is a commutative ring such that
SOLUTION
Fields require that all nonzero elements have a unique multiplicative inverse. However, this is not the case for dual numbers. To give an explicit counter example, we show that there is no dual number
By appropriate multiplication with the conjugate we show that
This proves that no choice of real part
Problem 2.3⋆ (B) A matrix representation of a ring are maps from a group/ring to matrices such that matrix addition and multiplication
behave exactly like addition and multiplication of the ring.
That is, if
Show that the following are matrix representations of complex numbers and dual numbers (respectively):
SOLUTION
Let
Next for multiplication we need
Now we move on to dual numbers
And finally for multiplication of dual numbers
Problem 2.4⋆ (C) What is the correct definition of division on dual numbers, i.e.,
for what choice of
SOLUTION
As with complex numbers, division is easiest to understand by first multiplying with the conjugate, that is:
We saw in the lectures that we have
and
Using these, we find that evaluating
which means
Setting
Problem 2.5 (C) Add support for cos, sin, and / to the type Dual:
xxxxxxxxxx# Dual(a,b) represents a + b*ϵstruct Dual{T} a::T b::Tend
# Dual(a) represents a + 0*ϵDual(a::Real) = Dual(a, zero(a)) # for real numbers we use a + 0ϵ
# Allow for a + b*ϵ syntaxconst ϵ = Dual(0, 1)
import Base: +, *, -, /, ^, zero, exp, cos, sin, one
# support polynomials like 1 + x, x - 1, 2x or x*2 by reducing to Dual+(x::Real, y::Dual) = Dual(x) + y+(x::Dual, y::Real) = x + Dual(y)-(x::Real, y::Dual) = Dual(x) - y-(x::Dual, y::Real) = x - Dual(y)*(x::Real, y::Dual) = Dual(x) * y*(x::Dual, y::Real) = x * Dual(y)
# support x/2 (but not yet division of duals)/(x::Dual, k::Real) = Dual(x.a/k, x.b/k)
# a simple recursive function to support x^2, x^3, etc.function ^(x::Dual, k::Integer) if k < 0 error("Not implemented") elseif k == 1 x else x^(k-1) * x endend
# support identity of type Dualone(x::Dual) = Dual(one(eltype(x.a)))
# Algebraic operations for duals-(x::Dual) = Dual(-x.a, -x.b)+(x::Dual, y::Dual) = Dual(x.a + y.a, x.b + y.b)-(x::Dual, y::Dual) = Dual(x.a - y.a, x.b - y.b)*(x::Dual, y::Dual) = Dual(x.a*y.a, x.a*y.b + x.b*y.a)
exp(x::Dual) = Dual(exp(x.a), exp(x.a) * x.b)
function cos(x::Dual) # TODO: implement cos for Duals ## SOLUTION Dual(cos(x.a), -sin(x.a) * x.b) ## ENDend
function sin(x::Dual) # TODO: implement sin for Duals ## SOLUTION Dual(sin(x.a), cos(x.a) * x.b) ## ENDend
function /(x::Dual, y::Dual) # TODO: implement division for Duals ## SOLUTION if iszero(y.a) error("Division for dual numbers is ill-defined when denonimator real part is zero.") end return Dual(x.a / y.a, (y.a * x.b - x.a * y.b) / y.a^2) ## ENDendProblem 2.6 (B) Use dual numbers to compute the derivatives to
Does your answer match (to 5 digits) Problem 1.4?
SOLUTION
With the previous problems solved, this is as simple as running
xxxxxxxxxxfdual = f(0.1+ϵ)fdual.bxxxxxxxxxxgdual = g(0.1+ϵ)gdual.bxxxxxxxxxxcontdual = cont(1000,0.1+ϵ)contdual.bNewton iteration is an algorithm for computed roots of a function
Problem 3.1 (B) Use Dual to implement the following function which returns
xxxxxxxxxxfunction newton(f, x0, n) ## TODO: compute x_n ## SOLUTION xk = x0 for k = 1:n fd = f(xk + ϵ) xk = xk - fd.a / fd.b end return xk ## ENDendSOLUTION
Here is a simple test case for the above function:
xxxxxxxxxx# examplef_ex(x) = x^2-3n = 5x0 = 1# compute proposed rootxn = newton(f_ex,x0,n) # check that obtained point is an approximate rootf_ex(xn)END
Problem 3.2 (B) Compute points
(Hint: you may need to try different x0 and n to get convergence. Plotting the function should give an
indication of a good initial guess.)
SOLUTION
We can plot the functions on a few ranges to get an intuition
xxxxxxxxxxusing Plots
f1(x) = exp(exp(x)*cos(x) + sin(x)) - 6f2(x) = prod([x] ./ range(1,1000) .- 1) - 1/2
plot(f1,-2,2,color="blue")plot!(x->0,color="red")
xxxxxxxxxxplot(f2,0,2,color="blue")plot!(x->0,color="red") And then use our Newton iteration to compute approximate roots
xxxxxxxxxxrootf1 = newton(f1, 1.5, 5)f1(rootf1)xxxxxxxxxxrootf2 = newton(f2, 0.3, 8)f2(rootf2)Problem 3.3 (A) Compute points
SOLUTION
xxxxxxxxxxxn = newton(x->cont(1000,x)-1.,0.5,10)cont(1000,xn)-1.xxxxxxxxxxxn = newton(x->cont(1000,x)-2.,0.5,10)cont(1000,xn)-2.xxxxxxxxxxxn = newton(x->cont(1000,x)-3.,0.5,10)cont(1000,xn)-3.By plotting the function we can conjecture that the continued fraction converges to
xxxxxxxxxxusing Plotsplot(x->cont(1000,x),0,10)plot!(x->sqrt(x))There are a handful of ways to prove this conjecture. Here is one - start with
then extend the RHS by
Dividing through
Note that we can now plug this equation into itself on the right hand side to obtain a recursive continued fraction for the square root function.
This problem sheet explores implementation of triangular solves,
supporting a matrix with two super-diagonals, as well as
permutation and Householder reflections that can be applied to a vector in
Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
xxxxxxxxxxusing LinearAlgebra, Test
# We will override these functions belowimport Base: getindex, setindex!, size, *, \Problem 1.1 (C) Show that A*x is not
implemented as mul(A, x) from the lecture notes
by finding a Float64 example where the bits do not match.
SOLUTION
First we have to define mul(A, x) as in the lecture notes:
xxxxxxxxxxfunction mul(A, x) m,n = size(A) c = zeros(eltype(x), m) # eltype is the type of the elements of a vector/matrix for j = 1:n, k = 1:m c[k] += A[k, j] * x[j] end cendThen we can easily find examples, in fact we can write a function that searches for examples:
xxxxxxxxxxusing ColorBitstring
function findblasmuldifference(n,l) for j = 1:n A = randn(l,l) x = rand(l) if A*x != mul(A,x) return (A,x) end endend
n = 100 # number of attemptsl = 10 # size of objects(A,x) = findblasmuldifference(n,l) # find a difference
println("Bits of obtained A*x")printlnbits.(A*x);println("Bits of obtained mul(A,x)")printlnbits.(mul(A,x));println("Difference vector between the two solutions:")println(A*x-mul(A,x))
Problem 2.1 (B) Complete the following functions for solving linear systems with triangular systems by implementing back and forward-substitution:
xxxxxxxxxxfunction ldiv(U::UpperTriangular, b) n = size(U,1) if length(b) != n error("The system is not compatible") end x = zeros(n) # the solution vector ## TODO: populate x using back-substitutionend
function ldiv(U::LowerTriangular, b) n = size(U,1) if length(b) != n error("The system is not compatible") end x = zeros(n) # the solution vector ## TODO: populate x using forward-substitutionendSOLUTION
xxxxxxxxxxfunction ldiv(U::UpperTriangular, b) n = size(U,1) if length(b) != n error("The system is not compatible") end x = zeros(n) # the solution vector for k = n:-1:1 # start with k=n, then k=n-1, ... r = b[k] # dummy variable for j = k+1:n r -= U[k,j]*x[j] # equivalent to r = r-U[k,j]*x[j] end x[k] = r/U[k,k] end xend
function ldiv(U::LowerTriangular, b) n = size(U,1) if length(b) != n error("The system is not compatible") end x = zeros(n) # the solution vector
for k = 1:n # start with k=1 r = b[k] # dummy variable for j = 1:k-1 r -= U[k,j]*x[j] end x[k] = r/U[k,k] end xendHere is an example:
xxxxxxxxxxx = [1,2,3,4]Ldense = [1 0 0 0; 2 3 0 0; 4 5 6 0; 7 8 9 10]Ltriang = LowerTriangular(Ldense)xxxxxxxxxxLdense\x-ldiv(Ltriang,x)
Problem 2.2⋆ (B) Given
such that:
What does
SOLUTION
By straightforward computation we find
and thus we find such a lower triangular
Problem 3.1 (C) Complete the implementation of UpperTridiagonal which represents a banded matrix with
bandwidths
xxxxxxxxxxstruct UpperTridiagonal{T} <: AbstractMatrix{T} d::Vector{T} # diagonal entries du::Vector{T} # super-diagonal enries du2::Vector{T} # second-super-diagonal entriesend
size(U::UpperTridiagonal) = (length(U.d),length(U.d))
function getindex(U::UpperTridiagonal, k::Int, j::Int) d,du,du2 = U.d,U.du,U.du2 # TODO: return U[k,j]end
function setindex!(U::UpperTridiagonal, v, k::Int, j::Int) d,du,du2 = U.d,U.du,U.du2 if j > k+2 error("Cannot modify off-band") end
# TODO: modify d,du,du2 so that U[k,j] == v U # by convention we return the matrixendSOLUTION
xxxxxxxxxxstruct UpperTridiagonal{T} <: AbstractMatrix{T} d::Vector{T} # diagonal entries du::Vector{T} # super-diagonal enries du2::Vector{T} # second-super-diagonal entriesend
size(U::UpperTridiagonal) = (length(U.d),length(U.d))
function getindex(U::UpperTridiagonal, k::Int, j::Int) d,du,du2 = U.d,U.du,U.du2
if j == k+2 return U.du2[k] elseif j == k+1 return U.du[k] elseif j == k return U.d[k] else # off band entries are zero return zero(eltype(U)) endend
function setindex!(U::UpperTridiagonal, v, k::Int, j::Int) d,du,du2 = U.d,U.du,U.du2 if (j > k+2)||(j<k) error("Cannot modify off-band") end
if j == k+2 U.du2[k] = v elseif j == k+1 U.du[k] = v elseif j == k U.d[k] = v end
U # by convention we return the matrixendWe can check that the above methods to read and write entries work:
xxxxxxxxxxA = UpperTridiagonal([1,2,3,4], [1,2,3], [1,2])xxxxxxxxxxA[1,1] = 2AProblem 3.2 (B) Complete the following implementations of * and \ for UpperTridiagonal so that
they take only
xxxxxxxxxxfunction *(U::UpperTridiagonal, x::AbstractVector) T = promote_type(eltype(U), eltype(x)) # make a type that contains both the element type of U and x b = zeros(T, size(U,1)) # returned vector # TODO: populate b so that U*x == b (up to rounding)end
function \(U::UpperTridiagonal, b::AbstractVector) T = promote_type(eltype(U), eltype(b)) # make a type that contains both the element type of U and b x = zeros(T, size(U,2)) # returned vector # TODO: populate x so that U*x == b (up to rounding)endSOLUTION
xxxxxxxxxxfunction *(U::UpperTridiagonal, x::AbstractVector) T = promote_type(eltype(U), eltype(x)) # make a type that contains both the element type of U and x b = zeros(T, size(U,1)) # returned vector n = size(U)[1] for k = 1:n-2 b[k] = dot(U[k,k:k+2],x[k:k+2]) end # the last two rows need a bit more care b[n-1] = dot(U[n-1,n-1:n],x[n-1:n]) b[n] = U[n,n]*x[n] return bend
function \(U::UpperTridiagonal, b::AbstractVector) T = promote_type(eltype(U), eltype(b)) # make a type that contains both the element type of U and b x = zeros(T, size(U,2)) # returned vector n = size(U)[1] if length(b) != n error("The system is not compatible") end for k = n:-1:1 # start with k=n, then k=n-1, ... r = b[k] # dummy variable for j = k+1:min(k+3,n) r -= U[k,j]*x[j] end x[k] = r/U[k,k] end xendAnd here is an example of what we have implemented in action:
xxxxxxxxxxAbanded = UpperTridiagonal([1.1,2.2,3.3,4.4], [1.9,2.8,3.7], [1.5,2.4])Adense = Matrix(Abanded) # one of many easy ways to convert to dense storage
Adense == Abandedxxxxxxxxxxx = [5.2,3/4,2/3,9.1415]Adense*xxxxxxxxxxxAbanded*xxxxxxxxxxxAdense\xxxxxxxxxxxAbanded\xAnd just for fun, let's do a larger scale dense speed comparison
xxxxxxxxxxusing BenchmarkToolsn = 10000Abanded = UpperTridiagonal(rand(n),rand(n-1),rand(n-2))Adense = Matrix(Abanded) # one of many easy ways to convert to dense storagex = rand(n)
Adense*x;
xxxxxxxxxx Abanded*x;
xxxxxxxxxx Adense\x;
xxxxxxxxxx Abanded\x;
Problem 4.1⋆ (C) What are the permutation matrices corresponding to the following permutations?
SOLUTION
Let
Either way, we have
Problem 4.2 (B) Complete the implementation of a type representing
permutation matrices that supports P[k,j] and such that * takes
xxxxxxxxxxusing LinearAlgebra, Test
struct PermutationMatrix <: AbstractMatrix{Int} p::Vector{Int} # represents the permutation whose action is v[p] function PermutationMatrix(p::Vector) sort(p) == 1:length(p) || error("input is not a valid permutation") new(p) endend
function size(P::PermutationMatrix) (length(P.p),length(P.p))endfunction getindex(P::PermutationMatrix, k::Int, j::Int) # P.p[k] == j ? 1 : 0 if P.p[k] == j 1 else 0 endendfunction *(P::PermutationMatrix, x::AbstractVector) x[P.p]end
# If your code is correct, this "unit test" will succeedp = [1, 4, 2, 5, 3]P = PermutationMatrix(p) P == I(5)[p,:]The following codes show that * takes
xxxxxxxxxxusing BenchmarkTools, Randomx=0 # for some reason, an error "UndefVarError: x not defined" will occur without this line. p and P are previously defined so they work fine.for n = 10 .^ (1:7) print("n = ", n) p = randperm(n) P = PermutationMatrix(p) x = randn(n) P*xendProblem 5.1⋆ (C) Show that orthogonal matrices preserve the 2-norm of vectors:
SOLUTION
∎
Problem 5.2⋆ (B) Show that the eigenvalues
SOLUTION
Let
∎
Problem 5.3⋆ (A) Explain why an orthogonal matrix
SOLUTION
Note that
since
Problem 5.4 (B) Complete the implementation of a type representing
reflections that supports Q[k,j] and such that * takes
xxxxxxxxxxusing LinearAlgebra, Test
# Represents I - 2v*v'```juliastruct Reflection{T} <: AbstractMatrix{T} v::Vector{T}end
Reflection(x::Vector{T}) where T = Reflection{T}(x/norm(x))
function size(Q::Reflection) (length(Q.v),length(Q.v))endfunction getindex(Q::Reflection, k::Int, j::Int) (k == j) - 2Q.v[k]*Q.v[j] # note true is treated like 1 and false like 0endfunction *(Q::Reflection, x::AbstractVector) x - 2*Q.v * dot(Q.v,x) # (Q.v'*x) also works instead of dotend
xxxxxxxxxxx = randn(5)Q = Reflection(x)v = x/norm(x) Q == I-2v*v' Q*v ≈ -v Q'Q ≈ I
xxxxxxxxxxThe following codes show that `*` takes $O(n)$ of both time and space.```juliausing BenchmarkTools, Randomfor n = 10 .^ (1:7)print("n = ", n)x = randn(n)Q = Reflection(x)v = randn(n)@btime Q*vend
Problem 5.5 (C) Complete the following implementation of a Housholder reflection so that the
unit tests pass. Here s == true means the Householder reflection is sent to the positive axis and s == false is the negative axis.
xxxxxxxxxxfunction householderreflection(s::Bool, x::AbstractVector) y = copy(x) # don't modify `x` if s y[1] -= norm(x) else y[1] += norm(x) end Reflection(y)end
x = randn(5)Q = householderreflection(true, x) Q isa Reflection Q*x ≈ [norm(x);zeros(eltype(x),length(x)-1)]
Q = householderreflection(false, x) Q isa Reflection Q*x ≈ [-norm(x);zeros(eltype(x),length(x)-1)]Problem 5.6⋆ (A) Consider a Householder reflection with
SOLUTION
Since
Numerically, let the length of the significand be
where
Since
In conclusion, it's very accurate to compute
xxxxxxxxxxusing PlotsS = precision(Float64)-1;relative_error = zeros(S)for n = 1:S h = 2.0^(-n) exact = 1+sqrt(1+BigFloat(h)^2) numerical = 1+sqrt(1+h^2) relative_error[n] = abs(numerical-exact)/exactendprintln(eps())maximum(relative_error)If
If
If
Let us verify these conclusions:
xxxxxxxxxxusing PlotsS=precision(Float64)-1;relative_error=zeros(S)for n=1:S h=2.0^(-n) exact=1-sqrt(1+BigFloat(h)^2) numerical=1-sqrt(1+h^2) relative_error[n]=-abs(numerical-exact)/exactendplot(1:S,log2.(relative_error), xlabel="n", ylabel="relative error (log2)")From this plot we can clearly identify the 3 phases:
The transition point between phase 1 and 2 is at the stage for
This problem sheet explores least squares, the QR decomposition including for tridiagonal matrices, and the PLU decompositions.
Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
xxxxxxxxxxusing LinearAlgebra, Plots, Test
Problem 1.1 (B) Find and plot the best least squares fit of
SOLUTION
xxxxxxxxxxx = range(0, 1; length=1000)pl = plot()f = 1 ./ (5x.^2 .+ 1)for n = 0:10 # Creates 1000 x n+1 matrix with entries x[k]^(j-1) A = x .^ (0:n)' c = A \ f plot!(x, A*c)endplxxxxxxxxxx# Here is the same approach as above but this time explicitly using QR as discussed in the lecture notes:x = range(0, 1; length=1000)pl = plot()f = 1 ./ (5x.^2 .+ 1)for n = 0:10 # Creates 1000 x n+1 matrix with entries x[k]^(j-1) A = x .^ (0:n)' (Q, R) = qr(A) c = R \ Q[:, 1:n+1]' * f plot!(x, A * c)endplEND
Problem 1.2⋆ (B) Show that every matrix has a QR decomposition such that the diagonal of
SOLUTION
Beginning with the square case, a square matrix
The same argument works for the non-square cases as long as we take care to consider the appropriate dimensions and pad with the identity matrix. Note that
Finally, assume that
Since the matrix
Here are some practical examples in Julia:
xxxxxxxxxxE = [-1 0 0 0; 0 1 0 0; 0 0 1 0; 0 0 0 1]R = [-1 3; 0 2; 0 0; 0 0]xxxxxxxxxxE*RxxxxxxxxxxE = [-1 0 0 0; 0 1 0 0; 0 0 -1 0; 0 0 0 1]R = [-1 3 7 8 2 1; 0 2 9 2 1 9; 0 0 -7 2 1 9; 0 0 0 7 5 2]xxxxxxxxxxE*REND
Problem 1.3⋆ (B) Show that the QR decomposition of a square invertible matrix is unique,
provided that the diagonal of
SOLUTION
Assume there is a second decomposition also with positive diagonal
Then we know
Note
Problem 2.1⋆ (B) The modified Gram–Schmidt algorithm is a slight variation of Gram–Schmidt where instead of computing
we compute it step-by-step:
Show that
SOLUTION
Recall that the column vectors of
END
Problem 2.2 (B) Complete the following function implementing the modified Gram–Schmidt algorithm:
xxxxxxxxxxfunction modifiedgramschmidt(A) m,n = size(A) m ≥ n || error("Not supported") R = zeros(n,n) Q = zeros(m,n) for j = 1:n # TODO: Implement the Modified Gram–Schmidt algorthm end Q,Rend
A = randn(300,300)Q,R = modifiedgramschmidt(A) A ≈ Q*R Q'Q ≈ ISOLUTION
xxxxxxxxxxfunction modifiedgramschmidt(A) m,n = size(A) m ≥ n || error("Not supported") R = zeros(n,n) Q = zeros(m,n) # looping through the columns for j = 1:n v = A[:,j] # now go through each row for k = 1:j-1 # take inner product with q_k and v R[k, j] = Q[:, k]' * v v = v - R[k, j] * Q[:, k] end R[j,j] = norm(v) Q[:,j] = v/R[j,j] end Q,Rend
A = randn(300,300)Q,R = modifiedgramschmidt(A) A ≈ Q*R Q'Q ≈ IEND
Problem 2.3 (B) Compare the orthogonality of Q between gramschmidt and modifiedgramschmidt
when applied to a 300 × 300 random matrix.
SOLUTION
For reference, here is the gramschmidt algorithm from the lecture notes:
xxxxxxxxxxfunction gramschmidt(A) m,n = size(A) m ≥ n || error("Not supported") R = zeros(n,n) Q = zeros(m,n) for j = 1:n for k = 1:j-1 R[k,j] = Q[:,k]'*A[:,j] end v = A[:,j] - Q[:,1:j-1]*R[1:j-1,j] R[j,j] = norm(v) Q[:,j] = v/R[j,j] end Q,Rend
# Now compare MGS with GS:
A = randn(300,300)Q,R = gramschmidt(A)Qm,Rm = modifiedgramschmidt(A)
maximum(abs.(Q*Q')-I)Modified Gram–Schmidt is more accurate for constructing an orthogonal matrix:
xxxxxxxxxxmaximum(abs.(Qm*Qm')-I)END
Problem 3.1 (B)
Complete the definition of Reflections which supports a sequence of reflections,
that is,
where the vectors are stored as a matrix V whose
xxxxxxxxxxstruct Reflections{T} <: AbstractMatrix{T} V::Matrix{T}end
import Base: *, size, getindex
size(Q::Reflections) = (size(Q.V,1), size(Q.V,1))
function *(Q::Reflections, x::AbstractVector) T = eltype(Q) r = Vector{T}(x) # convert x to a vector of type T # TODO: Apply Q in O(mn) operations by applying # the reflection corresponding to each column of Q.V to r ## SOLUTION m,n = size(Q.V) for j = n:-1:1 v = Q.V[:, j] r = r - 2 * v * dot(v, r) end ## END
rend
function getindex(Q::Reflections, k::Int, j::Int) # TODO: Return Q[k,j] in O(mn) operations (hint: use *)
## SOLUTION T = eltype(Q.V) m,n = size(Q) ej = zeros(T, m) ej[j] = one(T) return (Q*ej)[k] ## ENDend
Y = randn(5,3)V = Y * Diagonal([1/norm(Y[:,j]) for j=1:3])Q = Reflections(V) Q ≈ (I - 2V[:,1]*V[:,1]')*(I - 2V[:,2]*V[:,2]')*(I - 2V[:,3]*V[:,3]') Q'Q ≈ IProblem 3.2 (B) Complete the following function that implements
Householder QR using only
xxxxxxxxxxfunction householderqr(A) m,n = size(A) if m < n error("Only support more rows than columns") end # R begins as A, modify it in place R = copy(A) Q = Reflections(Matrix(1.0I, m, n)) for j = 1:n # TODO: populate Q and R using O(m*(n-j)) operations ## SOLUTION y = copy(R[j:end, j]) y[1] += ((y[1] ≥ 0) ? 1 : -1) * norm(y) w = y / norm(y) Q.V[j:end, j] = w R[j:end, :] = R[j:end, :] - 2 * w * (w' * R[j:end, :]) ## END end Q,Rend
A = randn(4,6)Q,R = householderqr(A) Q*R ≈ A Q'Q ≈ ISOLUTION
Here we sketch some additional notes comparing performance.
xxxxxxxxxx# Here's a performance comparison with what we saw in the lecture notes:function householderreflection(x) y = copy(x) # we cannot use sign(x[1]) in case x[1] == 0 if x[1] ≥ 0 y[1] += norm(x) else y[1] -= norm(x) end w = y / norm(y) I - 2 * w * w'endfunction lecturehouseholderqr(A) m,n = size(A) R = copy(A) Q = Matrix(1.0I, m, m) for j = 1:n Qⱼ = householderreflection(R[j:end,j]) R[j:end,:] = Qⱼ*R[j:end,:] Q[:,j:end] = Q[:,j:end]*Qⱼ end Q,Rend
using BenchmarkTools
for j = 1:5 A = randn(j*100,j*100) lecturehouseholderqr(A);endxxxxxxxxxxfor j = 1:5 A = randn(j*100,j*100) householderqr(A);endEND
Problem 4.1⋆ (A) Describe an algorithm for computing the QR decomposition of a tridiagonal matrix using rotations instead of reflections to upper-triangularise column-by-column.
SOLUTION
Let
where each
Recall that,
and that,
is a rotation matrix where
where
Then we take,
where
Now, for
where
This gives,
for
Finally we define,
so that
END
Problem 4.2 (B) Implement Rotations which represents an orthogonal matrix Q that is a product
of rotations of angle θ[k], each acting on the entries k:k+1. That is, it returns
xxxxxxxxxxstruct Rotations{T} <: AbstractMatrix{T} θ::Vector{T}end
import Base: *, size, getindex
size(Q::Rotations) = (length(Q.θ)+1, length(Q.θ)+1)
function *(Q::Rotations, x::AbstractVector) T = eltype(Q) y = convert(Vector{T}, x) # TODO: Apply Q in O(n) operations, modifying y in-place
## SOLUTION θ = Q.θ #Does Q1....Qn x for k = length(θ):-1:1 #below has 4 ops to make the matrix and 12 to do the matrix-vector multiplication, #total operations will be 48n = O(n) c, s = cos(θ[k]), sin(θ[k]) y[k:(k+1)] = [c -s; s c] * y[k:(k+1)] end ## END
yend
function getindex(Q::Rotations, k::Int, j::Int) # TODO: Return Q[k,j] in O(n) operations (hint: use *)
## SOLUTION #recall that A_kj = e_k'*A*e_j for any matrix A #so if we use * above, this will take O(n) operations n = size(Q)[1] ej = zeros(eltype(Q), n) ej[j] = 1 #note, must be careful to ensure that ej is a VECTOR #not a MATRIX, otherwise * above will not be used Qj = Q * ej Qj[k] ## ENDend
θ1 = randn(5)Q = Rotations(θ1) Q'Q ≈ I Rotations([π/2, -π/2]) ≈ [0 0 -1; 1 0 0; 0 -1 0]Problem 4.3 (A) Combine Rotations and UpperTridiagonal from last problem sheet
to implement a banded QR decomposition, bandedqr, that only takes atan(y,x)
to determine the angle.
xxxxxxxxxx# First we include UpperTridiagonal from last problem sheet.# bandedqr is below.import Base: *, size, getindex, setindex!struct UpperTridiagonal{T} <: AbstractMatrix{T} d::Vector{T} # diagonal entries du::Vector{T} # super-diagonal enries du2::Vector{T} # second-super-diagonal entriesend
size(U::UpperTridiagonal) = (length(U.d),length(U.d))
function getindex(U::UpperTridiagonal, k::Int, j::Int) d,du,du2 = U.d,U.du,U.du2 # TODO: return U[k,j] if j - k == 0 d[j] elseif j - k == 1 du[k] elseif j - k == 2 du2[k] else 0 endend
function setindex!(U::UpperTridiagonal, v, k::Int, j::Int) d,du,du2 = U.d,U.du,U.du2 if j > k+2 error("Cannot modify off-band") end
# TODO: modify d,du,du2 so that U[k,j] == v if j - k == 0 d[k] = v elseif j - k == 1 du[k] = v elseif j - k == 2 du2[k] = v else error("Cannot modify off-band") end U = UpperTridiagonal(d,du,du2) U # by convention we return the matrixend
function bandedqr(A::Tridiagonal) n = size(A, 1) Q = Rotations(zeros(n - 1)) # Assume Float64 R = UpperTridiagonal(zeros(n), zeros(n - 1), zeros(n - 2)) R[1, 1:2] = A[1, 1:2] for j = 1:n-1 # angle of rotation Q.θ[j] = atan(A[j+1, j], R[j, j]) θ = -Q.θ[j] # rotate in oppoiste direction c, s = cos(θ), sin(θ) # [c -s; s c] represents the rotation that introduces a zero. ## TODO: modify rows k and k+1 of R to represent # applying the rotation. Note you will need to use row k+1 of A. ## SOLUTION # This is [c -s; s c] to j-th column, but ignore second row # which is zero R[j, j] = c * R[j, j] - s * A[j+1, j] # This is [c -s; s c] to (j+1)-th column R[j:j+1, j+1] = [c -s; s c] * [R[j, j+1]; A[j+1, j+1]] if j < n - 1 # This is [c -s; s c] to (j+2)-th column, where R is still zero R[j:j+1, j+2] = [-s; c] * A[j+1, j+2] end ## END end Q, Rend
A = Tridiagonal([1, 2, 3, 4], [1, 2, 3, 4, 5], [1, 2, 3, 4])Q, R = bandedqr(A) Q*R ≈ AProblem 4.4⋆ (B) Could one redesign the above to only use IEEE operatations (addition, multiplication, square-roots,
avoiding calls atan, cos, and sin)?
Would it have been possible to implement this algorithm using reflections?
If so, what would be the structure of a matrix whose columns are the vectors of reflections?
SOLUTION
Yes, one could avoid using atan, cos, and sin. At each stage of the algorithm, we only use atan to compute θ[i] which we then use to compute
END
Problem 5.1⋆ (C) Compute the PLU decompositions for the following matrices:
SOLUTION Part 1 Compute the PLU decomposition of,
We begin by permuting the first and second row as |2| > |0|, hence,
We then choose,
There is no need to permute at this stage, so
Since
Hence, we have
We can quickly check:
xxxxxxxxxxP = [0 1 0; 1 0 0; 0 0 1]L = [1 0 0; 0 1 0; 1/2 -1 1]U = [2 6 1; 0 2 1; 0 0 9/2]
P*L*UPart 2
Find the
We see that we must start with the permutation,
We then choose
We then choose
We then choose
We then choose
We complete the triangularisation by choosing
We now consider,
These can be computed via,
to obtain,
This gives us,
from which it is clear that,
Finally, we have that,
so that
To check:
xxxxxxxxxxP=[0 0 0 1;0 0 1 0;1 0 0 0;0 1 0 0]L=[1 0 0 0;1/3 1 0 0; -2/3 2/11 1 0; -1/3 1/11 1/2 1]U=[-3 -5 6 1;0 11/3 6 -7/3;0 0 10/11 23/11;0 0 0 -1/2]P*L*UEND
This problem sheet explores positive definite matrices, Cholesky decompositions, matrix norms, and the singular value decomposition.
Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
xxxxxxxxxxusing LinearAlgebra, Plots, TestProblem 1.1⋆ (C) Use the Cholesky decomposition to determine which of the following matrices are symmetric positive definite:
SOLUTION
A matrix is symmetric positive definite (SPD) if and only if it has a Cholesky decomposition, so the task here is really just to compute Cholesky decompositions (by hand). Since our goal is to tell if the Cholesky decompositions exist, we do not have to compute
Matrix 1
Matrix 2
Matrix 3
Matrix 4
We can check that we did this correctly by running the following in Julia:
xxxxxxxxxxcholesky([1 -1; -1 3])xxxxxxxxxx# this throws an error when uncommented and run because the matrix is not SPD# cholesky([1 2 2; 2 1 2; 2 2 1])xxxxxxxxxxcholesky([3 2 1; 2 4 2; 1 2 5])xxxxxxxxxxcholesky([4 2 2 1; 2 4 2 2; 2 2 4 2; 1 2 2 4])END
Problem 1.2⋆ (B) Recall that an inner product
Prove that
where
SOLUTION
We begin by showing that
END
Problem 1.3⋆ (A) Show that a matrix is symmetric positive definite if and only if it has a Cholesky decomposition of the form
where
SOLUTION
We didn't discuss this but note that because a symmetric positive definite matrix has stricly positive eigenvalues: for a normalised eigenvector we have
Thus they are always invertible. Then note that any such matrix has a Cholesky decomposition of standard form
Alternatively, we can replicate the procedure of computing the Cholesky decomposition beginning in the bottom right instead of the top left. Write:
The induction proceeds as in the lower triangular case.
END
Problem 1.4⋆ (A) Prove that the following
Deduce its two Cholesky decompositions:
SOLUTION
We first prove that
Once we know that
Now we solve the recurrence. Substituting the second equation into the third one:
We can apply the same process to
is the permutation that reverses a vector.
So we can also flip
so that
Alternatively one can use the procedure from Problem 1.3. That is, write:
Continuing proceeds as above.
END
Problem 1.5 (B) SymTridiagonal(dv, eu) is a type for representing symmetric tridiagonal
matrices (that is, SymTridiagonal(dv, ev) == Tridiagonal(ev, dv, ev)). Complete the following
implementation of cholesky to return a Bidiagonal cholesky factor in n = 1_000_000.
xxxxxxxxxximport LinearAlgebra: cholesky
# return a Bidiagonal L such that L'L == A (up to machine precision)cholesky(A::SymTridiagonal) = cholesky!(copy(A))
# return a Bidiagonal L such that L'L == A (up to machine precision)# You are allowed to change Afunction cholesky!(A::SymTridiagonal) d = A.dv # diagonal entries of A u = A.ev # sub/super-diagonal entries of A T = float(eltype(A)) # return type, make float in case A has Ints n = length(d) ld = zeros(T, n) # diagonal entries of L ll = zeros(T, n-1) # sub-diagonal entries of L
## SOLUTION ld[1]=sqrt(d[1]) for k=1:n-1 ll[k]=u[k]/ld[k] ld[k+1]=sqrt(d[k+1]-ll[k]^2) end ## END Bidiagonal(ld, ll, :L)end
n = 1000A = SymTridiagonal(2*ones(n),-ones(n-1))L = cholesky(A) L ≈ cholesky(Matrix(A)).LProblem 2.1⋆ (B) Prove the following:
SOLUTION
Step 1. upper bounds
Step 2.1. meeting the upper bound (
Let
Step 2.2. meeting the upper bound (
Let
Conclusion
In both cases, equality can hold, so the upper bounds are actually maxima.
END
Problem 2.2⋆ (B) For a rank-1 matrix
Hint: use the Cauchy–Schwartz inequality.
SOLUTION
By Cauchy-Schwartz inequality,
END
Problem 2.3⋆ (B) Show for any orthogonal matrix
by first showing that
SOLUTION
On the other hand,
END
Problem 3.1⋆ (B) Show that
SOLUTION
From Problem 2.3 use the fact that
Hence,
where
Knowing that
Moreover, since if the rank of
Hence,
END
Problem 3.2 (A) Consider functions sampled on a
For which examples does the answer change when
SOLUTION
xxxxxxxxxx#define functionsf₁(x,y) = (x + 2 * y) ^ 2f₂(x,y) = cos(sin(x)*exp(y))f₃(x,y) = 1/(x + y + 1)f₄(x,y) = sign(x-y)
#define n and error goalerror = 1e-5
#helper function to compute nxn samplesfunction samples(f, n) x = y = range(0, 1; length=n) return f.(x,y')endxxxxxxxxxxfunction find_min_rank(f, n, ϵ) F = samples(f,n) U,σ,V = svd(F) for k=1:n Σ_k = Diagonal(σ[1:k]) U_k = U[:,1:k] V_k = V[:,1:k] if norm(U_k * Σ_k * V_k' - F) <= ϵ return k end endend
n=100println("Error ≤ ", error, " with n = ", n)println("Rank for f₁ = ", find_min_rank(f₁, n, error))println("Rank for f₂ = ", find_min_rank(f₂, n, error))println("Rank for f₃ = ", find_min_rank(f₃, n, error))println("Rank for f₄ = ", find_min_rank(f₄, n, error))
n=1000println("Error ≤ ", error, " with n = ", n)println("Rank for f₁ = ", find_min_rank(f₁, n, error))println("Rank for f₂ = ", find_min_rank(f₂, n, error))println("Rank for f₃ = ", find_min_rank(f₃, n, error))println("Rank for f₄ = ", find_min_rank(f₄, n, error))END
Problem 3.3⋆ (B) For
Show that it satisfies the Moore-Penrose conditions:
SOLUTION
Let
END
Problem 3.4⋆ (A) Show for
SOLUTION
The proof mimics that of the QR decomposition. Write
so that
The second term is independent of
END
Problem 3.5⋆ (A)
If
SOLUTION
Let
Following the previous part we deduce:
As
which is minimised when
END
This problem sheet explores condition numbers, indefinite integration, and Euler's method.
Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
xxxxxxxxxxusing LinearAlgebra, PlotsProblem 1.1⋆ (B) Prove that, if
for some constant
SOLUTION
Problem 1.2⋆ (B) Let
SOLUTION
Recalling from Problem Sheet 5 - Problem 2.3* - SOLUTION, we know that
Recalling from Problem Sheet 5 - Problem 3.1*, we have
Problem 1.3⋆ (C) Compute the 1-norm, 2-norm, and ∞-norm condition numbers for the following matrices:
(Hint: recall that the singular values of a matrix
SOLUTION
Finally, for the
so an eigenvalue
The larger eigenvalue corresponds to
we have that the singular values are the
An eigenvalue
For,
It is clear that
Problem 1.4 (B)
State a bound on the relative error on
Compute the relative error in computing big for a high-precision version to compare against)
where svd command with
Float64 inputs. How does the error compare to the predicted error bound?
SOLUTION
The Theorem (relative-error for matrix vector) tells us that,
if the relative pertubation error
The condition number of the first matrix is 453.33 (see code below to compute that), and
xxxxxxxxxxusing LinearAlgebra
A = [1/3 1/5; 0 1/1000]U,σ,V = svd(A)κ = σ[1]/σ[end]v = V[:,end]xxxxxxxxxxA_big = [big(1)/3 big(1)/5; 0 big(1)/1000]xxxxxxxxxxnorm(A_big*v - A*v, 2)/norm(A_big*v, 2)xxxxxxxxxx2*sqrt(2)*eps()/(2-2*eps())* κxxxxxxxxxxB = diagm( 2.0 .^(0:-1:-10))U,σ,V = svd(B)κ = σ[1]/σ[end]v = V[:,end]xxxxxxxxxxB*vNote, this is exact so the relative error is 0, within the upper bound.
xxxxxxxxxx10*sqrt(10)*eps()/(10-10*eps()) * 2^(10)Problem 2.1 (B) Implement backward differences to approximate
indefinite-integration. How does the error compare to forward
and mid-point versions for
to 3 digits, where
SOLUTION
We can implement the backward difference solution as follows:
xxxxxxxxxxc = 0 # u(0) = 0f = x -> cos(x)n = 10
x = range(0,1;length=n)h=step(x)A = Bidiagonal([1; fill(1/h, n-1)], fill(-1/h, n-1), :L)ub = A\[c; f.(x[2:end])]
##adding the forward and midpoint solutions here as well for comparisonm = (x[1:end-1] + x[2:end])/2
uf = A \ [c; f.(x[1:end-1])]um = A \ [c; f.(m)]
plot(x, sin.(x); label="sin(x)", legend=:bottomright)scatter!(x, ub; label="backward")scatter!(x, um; label="mid")scatter!(x, uf; label="forward")Comparing each method's errors, we see that the backward method has the same error as the forward method:
xxxxxxxxxxfunction indefint(x) h = step(x) # x[k+1]-x[k] n = length(x) L = Bidiagonal([1; fill(1/h, n-1)], fill(-1/h, n-1), :L)end
function forward_err(u, c, f, n) x = range(0, 1; length = n) uᶠ = indefint(x) \ [c; f.(x[1:end-1])] norm(uᶠ - u.(x), Inf)end
function mid_err(u, c, f, n) x = range(0, 1; length = n) m = (x[1:end-1] + x[2:end]) / 2 # midpoints uᵐ = indefint(x) \ [c; f.(m)] norm(uᵐ - u.(x), Inf)end
function back_err(u, c, f, n) x = range(0,1;length=n) h=step(x) A = Bidiagonal([1; fill(1/h, n-1)], fill(-1/h, n-1), :L) ub = A\[c; f.(x[2:end])] norm(ub - u.(x), Inf)end
c = 0 # u(0) = 0f = x -> cos(x)m = (x[1:end-1] + x[2:end])/2 # midpointsns = 10 .^ (1:8) # solve up to n = 10 million
scatter(ns, forward_err.(sin, 0, f, ns); xscale=:log10, yscale=:log10, label="forward")scatter!(ns, mid_err.(sin, 0, f, ns); label="mid")scatter!(ns, back_err.(sin, 0, f, ns); label="back",alpha=0.5)plot!(ns, ns .^ (-1); label="1/n")plot!(ns, ns .^ (-2); label="1/n^2")Part two:
xxxxxxxxxxc = 0 # u(0) = 0n = 10000
#functions defined in the solutions to problem sheet 2f = x -> exp(exp(x)cos(x) + sin(x))g = x -> prod([x] ./ (1:1000) .- 1)function cont(n, x) ret = 2*one(x) for k = 1:n-1 ret = 2 + (x-1)/ret end 1 + (x-1)/retend
x = range(0,1;length=n)h=step(x)A = Bidiagonal([1; fill(1/h, n-1)], fill(-1/h, n-1), :L)uf = A\[c; f.(x[2:end])]ug = A\[c; g.(x[2:end])]ucont = A\[c; cont.(1000, x[2:end])]
uf_int = uf[end]ug_int = ug[end]ucont_int = ucont[end]
println("first function: ")println(uf_int)println("second functions: ")println(ug_int)println("third function: ")println(ucont_int)Problem 2.2 (A) Implement indefinite-integration where we take the average of the two grid points:
What is the observed rate-of-convergence using the ∞-norm for
SOLUTION
The implementation is as follows:
xxxxxxxxxxn = 10x = range(0, 1; length=n+1)h = 1/nA = Bidiagonal([1; fill(1/h, n)], fill(-1/h, n), :L)c = 0 # u(0) = 0f = x -> cos(x)
𝐟 = f.(x) # evaluate f at all but last pointsuₙ = A \ [c; (𝐟[1:end-1] + 𝐟[2:end])/2]
plot(x, sin.(x); label="sin(x)", legend=:bottomright)scatter!(x, uₙ; label="average grid point")
# print(norm(uₙ - sin.(x),Inf))# norm(uₙ - sin.(x),1)Comparing the error to the midpoint method, we see that the errors are very similar:
xxxxxxxxxxfunction average_err(u, c, f, n) x = range(0,1;length=n) h=step(x) A = Bidiagonal([1; fill(1/h, n-1)], fill(-1/h, n-1), :L) ua = A\[c; (f.(x[1:end-1]) + f.(x[2:end]))/2] norm(ua - u.(x), Inf)end
c = 0 # u(0) = 0f = x -> cos(x)m = (x[1:end-1] + x[2:end])/2 # midpointsns = 10 .^ (1:8) # solve up to n = 10 million
scatter(ns, mid_err.(sin, 0, f, ns); xscale=:log10, yscale=:log10, label="mid")scatter!(ns, average_err.(sin, 0, f, ns); label="average")plot!(ns, ns .^ (-1); label="1/n")plot!(ns, ns .^ (-2); label="1/n^2")xxxxxxxxxxprint(mid_err.(sin, 0, f, ns) - average_err.(sin, 0, f, ns))Now looking at the
xxxxxxxxxxfunction average_err_l1(u, c, f, n) x = range(0,1;length=n) h=step(x) A = Bidiagonal([1; fill(1/h, n-1)], fill(-1/h, n-1), :L) ua = A\[c; (f.(x[1:end-1]) + f.(x[2:end]))/2] norm(ua - u.(x), 1)end
scatter(ns, average_err_l1.(sin, 0, f, ns); xscale=:log10, yscale=:log10, label="L_1")scatter!(ns, average_err.(sin, 0, f, ns); label="Inf")plot!(ns, ns .^ (-1); label="1/n")plot!(ns, ns .^ (-2); label="1/n^2")Problem 3.1 (B) Solve the following ODEs
using forward and/or backward Euler and increasing
If we increase the initial condition
SOLUTION
xxxxxxxxxxfunction first_eq(n) #this function takes n and returns the estimate of u(1) using n steps #define the range of t t = range(0, 1; length=n) #find the step-size h h = step(t)
#preallocate memory u = zeros(n) #set initial condition u[1] = 1 for k=1:n-1 u[k+1] = (1+h*cos(t[k]))*u[k] + h*t[k] end u[end]endns = 2 .^ (1:13)println(first_eq.(ns)')We see that
xxxxxxxxxx#define A(t)A = t -> [0 1; cos(t) 0]
function second_eq(n) #this function takes n and returns the estimate of v(1) using n steps #define the range of t t = range(0, 1; length=n) #find the step-size h h = step(t) #preallocate memory u = zeros(2,n) #set initial condition u[:,1] = [1.0, 0.0] for k=1:n-1 u[:,k+1] = (I + h .* A(t[k]))*u[:,k] + h .* [0, t[k]] end u[1,end]endns = 2 .^ (1:13)println(second_eq.(ns)')We see that
xxxxxxxxxx#define F(t)function F(t, w) [w[2], t*w[1] + 2*w[1]*w[1]]end
function third_eq(n=1000, c=1.0) #this function takes n and returns the estimate of w(1) #using n steps and with initial condition w(0) = c, with c defaulting to 0 #if no argument is passed #define the range of t t = range(0, 1; length=n) #find the step-size h h = step(t) #preallocate memory w = zeros(2,n) #set initial condition w[:,1] = [c, 0.0] for k=1:n-1 w[:,k+1] = w[:,k] + h .* F(t[k], w[:,k]) end w[1,end]endns = 2 .^ (1:18)println(third_eq.(ns)')For
xxxxxxxxxxfunction third_eq(c) #this function takes n and returns the estimate of w(1) #using n steps and with initial condition w(0) = c, with c defaulting to 0 #if no argument is passed n=100000 #define the range of t t = range(0, 1; length=n) #find the step-size h h = step(t) #preallocate memory w = zeros(2,n) #set initial condition w[:,1] = [c, 0.0] for k=1:n-1 w[:,k+1] = w[:,k] + h .* F(t[k], w[:,k]) end w[1,end]endcs = 2:10c_vals = third_eq.(cs)It appears that
Problem 3.2⋆ (B) For an evenly spaced grid
to recast
as a lower bidiagonal linear system. Use forward-substitution to extend this to vector linear problems:
SOLUTION
We have,
\begin{align}
\frac{u{k+1} - u_k}{h} \approx \frac{u'(t{k+1}) + u'(t_k)}{2} = \frac{a(t{k+1})u{k+1} + a(t{k})u{k}}{2} + \frac{1}{2}(f(t_{k+1}) + f(t_k)),
\end{align}
so we can write,
\begin{align}
\left(\frac{1}{h} - \frac{a(t{k+1})}{2}\right)u{k+1} + \left(-\frac{1}{h} - \frac{a(t{k})}{2}\right)u_k = \frac{1}{2}(f(t{k+1}) + f(t_k)).
\end{align}
With the initial condition
which is lower bidiagonal.
Now if we wish to use forward substitution in a vector linear problem, we can derive in much the same way as above:
to make the update equation,
with initial value,
Problem 3.3 (B) Implement the method designed in Problem 3.1 for the negative time Airy equation
on
SOLUTION
We will work with,
so that,
and with initial conditions,
We will use the method described in Problem 3.1, with
xxxxxxxxxxusing SpecialFunctionsn = 1000#define the range of tt = range(0, 50; length=n)#find the step-size hh = step(t)#define the function aa = t -> [0 1; -t 0]
#initialise storage vector and set initial conditionsU=zeros(2, n)U[:,1] = [1.0, 0.0]
#now iterate forwardfor k = 1:n-1 U[:,k+1] = (I - h/2 .* a(t[k+1])) \ ((I + h/2 .* a(t[k])) * U[:,k])end
#solution found on wolfram alphau = t -> real(1/2 * 3^(1/6) * gamma(2/3) * (sqrt(3)airyai((-1 + 0im)^(1/3)*t) + airybi((-1+0im)^(1/3)*t)))
plot(t, u.(t), label="Airy function")scatter!(t, U[1,:], label="uf", legend=:bottomright, markersize=2)To see when the error goes below 1%, consider the below:
xxxxxxxxxxn = 2 .^(7:14)function relative_err(n) t = range(0, 50; length=n) #find the step-size h h = step(t) #initialise storage vector and set initial conditions U=zeros(2, n) U[:,1] = [1.0, 0.0] #now iterate forward for k = 1:n-1 U[:,k+1] = (I - h/2 .* a(t[k+1])) \ ((I + h/2 .* a(t[k])) * U[:,k]) end norm(U[1,:] - u.(t), Inf)/norm(u.(t), Inf)end
plot(n, relative_err.(n), xscale=:log10)plot!([0.01], seriestype=:hline)Problem 3.4 (A) Implement Heat on a graph with
Find the time println to print the answer and break to exit the for loop).
SOLUTION
Following the hint, we will begin by writing a function called heat_dissipation(T), which runs a simulation of the heat equation with the specified conditions up to time
xxxxxxxxxxfunction heat_dissipation(T) n=1000 t = range(0, T; length=n) m=50 #find the step-size h h = step(t) #define the matrix Δ = Tridiagonal([fill(1.0, m-2); 0], [0; fill(-2.0, m-2); 0], [0; fill(1.0, m-2)]) #set initial conditions u = zeros(m,n) u[Int(m/2), 1] = 1 for k = 1:n-1 u[:,k+1] = (I - h*Δ)\u[:,k] u_inf = norm(u[:,k+1], Inf) if(u_inf < 0.001) return t[k+1] end end return t[n]endWe run this on a large range of values for
xxxxxxxxxxTs = 10:10:1000ts = heat_dissipation.(Ts)Plotting, we can clearly see that the time output by the function becomes the same towards the end of our range, so we will restrict our search to the end of this range.
xxxxxxxxxxplot(Ts, ts, label = "Time threshold reached", legend=:bottomright)Zooming in:
xxxxxxxxxxTs = range(900, 910,20)ts = heat_dissipation.(Ts)plot(Ts,ts)This looks promising, but it seems like the time-output is somewhat unstable even after
xxxxxxxxxxfunction heat_dissipation_large_n(T) n=200000 t = range(0, T; length=n) m=50 #find the step-size h h = step(t) #define the matrix Δ = Tridiagonal([fill(1.0, m-2); 0], [0; fill(-2.0, m-2); 0], [0; fill(1.0, m-2)]) #set initial conditions u = zeros(m,n) u[Int(m/2), 1] = 1 for k = 1:n-1 u[:,k+1] = (I - h*Δ)\u[:,k] u_inf = norm(u[:,k+1], Inf) if(u_inf < 0.001) return t[k+1] end end return t[n]end
Ts = [903, 904, 905, 906, 907]ts = heat_dissipation_large_n.(Ts)We can see that each time we get 902.38 to 2 decimal places, so this is our answer.
Problem 3.5 (B) Consider the equation
What behaviour do you observe on
SOLUTION
xxxxxxxxxxh = 0.5t = range(0, 10; step=h)n = length(t)uᶠ = zeros(n)uᵇ = zeros(n)uᵗ = zeros(n)uᶠ[1] = uᵇ[1] = uᵗ[1] = 1a = -10for k = 1:n-1 uᶠ[k+1] = (1+h*a) * uᶠ[k] uᵇ[k+1] = (1-h*a) \ uᵇ[k] uᵗ[k+1] = (1-h*a/2) \ (1 + h*a/2) * uᵗ[k]end
plot(t, abs.(uᶠ); yscale=:log10)plot!(t, abs.(uᵇ); yscale=:log10)plot!(t, abs.(uᵗ); yscale=:log10)We observe that for the stepsize
xxxxxxxxxxh = 0.01t = range(0, 10; step=h)n = length(t)uᶠ = zeros(n)uᵇ = zeros(n)uᵗ = zeros(n)uᶠ[1] = uᵇ[1] = uᵗ[1] = 1for k = 1:n-1 uᶠ[k+1] = (1+h*a) * uᶠ[k] uᵇ[k+1] = (1-h*a) \ uᵇ[k] uᵗ[k+1] = (1-h*a/2) \ (1 + h*a/2) * uᵗ[k]end
nanabs(x) = iszero(x) ? NaN : abs(x)
plot(t, nanabs.(uᶠ); yscale=:log10)plot!(t, nanabs.(uᵇ); yscale=:log10)plot!(t, nanabs.(uᵗ); yscale=:log10)For a smaller stepsize (
This problem sheet explores condition numbers, indefinite integration, and Euler's method.
Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.
xxxxxxxxxxusing LinearAlgebra, Plots, Test
Problem 1.1 (C) Construct a finite-difference approximation to the forced Helmholtz equation
and find an
xxxxxxxxxxfunction helm(k, n) x = range(0, 1; length = n) h = step(x) # TODO: Create a SymTridiagonal discretisation T = SymTridiagonal(ones(n-2)*(-2/h^2 + k^2),ones(n-3)*1/h^2) u = T \ exp.(x[2:end-1]) [0; u; 0]end
k = 10u = x -> (-cos(k*x) + exp(x)cos(k*x)^2 + cot(k)sin(k*x) - ℯ*cos(k)cot(k)sin(k*x) - ℯ*sin(k)sin(k*x) + exp(x)sin(k*x)^2)/(1 + k^2)
n = 2048 # TODO: choose n to get convergencex = range(0, 1; length=n) norm(helm(k, n) - u.(x)) ≤ 1E-4Problem 1.2 (B) Discretisations can also be used to solve eigenvalue equations. Consider the Schrödinger equation with quadratic oscillator:
Approximate the eigenvalues using eigvals(A) (which returns the eigenvalues of a
matrix A) with
SOLUTION
xxxxxxxxxxL = 10n = 1000x = range(-L,L; length=n)h = step(x)eigvals(SymTridiagonal(fill(2/h^2,n-2) + x[2:end-1].^2, fill(-1/h^2, n-3)))On inspection of the smallest values, it seems that the positive odd integers are the eigenvalues for
xxxxxxxxxxL = 100n = 10000x = range(-L,L; length = n)h = step(x)A = SymTridiagonal(x[2:end-1] .^ 2 .+ 2/h^2,ones(n-3)* (-1)/h^2)sort((eigvals(A)))[1:20]Problem 1.3⋆ (A) Consider Helmholtz with Neumann conditions:
Write down the finite difference approximation approximating
Use pivoting to reduce the equation to one involving a symmetric tridiagonal matrix.
SOLUTION
We have, with
which we write in matrix form as:
which we can make symmetric tridiagonal by multiplying the first row by
Problem 2.1⋆ (B) For the equation
where
SOLUTION Using the approximation from PS6 we obtain
So we get
We want to prove that
Take
where
Note that
Now, similarly to Euler's methods convergence theorem, we study consistency and stability.
Our discretisation approximates the true equation.
where
The inverse does not blow up the error.
Thus, we have
Note that
Combining stability and consistency we have
Problem 2.2⋆ (A) Consider the matrices
By writing down the inverse explicitly prove that if
Use this to prove convergence as
SOLUTION
Since
and
Then,
since, given
thus,
since
Moreover,
Hence,
Now we prove convergence for the forward Euler method as
Using equidistanced (with step
In order to study convergence we consider the limit as
Similarly to Euler's methods convergence theorem, we study consistency and stability.
In order to apply the theorem we note that we can define
Our discretisation approximates the true equation.
where
The inverse does not blow up the error.
First write, for
Thus, we have
Combining stability and consistency we have
Problem 3.1⋆ (C) Give explicit formulae for
Hint: You may wish to try the change of variables
SOLUTION
The explicit formulae for
For
and
similar to 1), replace cases
Either use Euler's formula to expand
Noting that
(
(
(
(
For the student unfamiliar with complex analysis:
4*) Substitute for
and finally use the Lemma "Discrete orthogonality" from the lecture notes to show the result above.
5*) Similar idea to 4) however notice that this time we only have non-zero contributions for
Problem 3.2⋆ (B) Prove that if the first
Use this to show for the Taylor case (
SOLUTION A straightforward application of integration by parts yields the result
for some constant
Problem 3.3⋆ (C)
If
SOLUTION This proof is nearly identical to the proof of "Theorem (Taylor series converges)" in the lecture notes. Only now one has to also subtract the negative coefficients from the negative approximate coefficients in the chain of arguments.
Problem 3.4⋆ (B) For the general (non-Taylor) case and
to
SOLUTION
Observe that by aliasing (see corollary in lecture notes) and triangle inequality we have the following
Using the result from Problem 3.2 yields
now we pick
more over summing over the left-hand side from
Finally let
where